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ABSTRACT 

We present a new Unbiased Minimal Variance (UMV) estimator for the purpose of 
reconstructing the large-scale structure of the universe from noisy, sparse and in- 
complete data. Similar to the Wiener Filter (WF), the UMV estimator is derived by 
requiring the linear minimal variance solution given the data and an assumed prior 
model specifying the underlying field covariance matrix. However, unlike the WF, the 
minimization is carried out with the added constraint of an unbiased reconstructed 
mean field. The new estimator does not necessitate a noise model to estimate the 
underlying field; however, such a model is required for evaluating the errors at each 
point in space. The general application of the UMV estimator is to predict the values 
of the reconstructed field in un-sampled regions of space (e.g., interpolation in the 
unobserved Zone of Avoidance), and to dynamically transform from one measured 
field to another (e.g., inversion of radial peculiar velocities to over-densities). Here, 
we provide two very simple applications of the method. The first, is to recover a ID 
signal from noisy, convolved data with gaps, e.g., CMB time-ordered data. The second 
application is a reconstruction of the density and 3D peculiar velocity fields from mock 
SEcat galaxy peculiar velocity catalogs. 

Key words: cosmology: theory-large-scale structure of universe, methods-statistical 



1 INTRODUCTION 

Mapping the distribution of galaxies and their peculiar ve- 
locity field constitutes a major research area in modern as- 
tronomy setting both the observational and theoretical foun- 
dations of cosmology and, in particular, of large scale struc- 
ture (LSS). Within the framework of gravitational insta- 
bility, the large scale galaxy distribution offers a probe of 
the nature of the primordial perturbation field, and can be 
used to set strong constraints on the values of cosmological 
parameters. However, since astronomical observations give 
only incomplete and noisy information on the real universe, 
the recovery of the underlying signal from these observations 
can be a non-trivial task, forcing one to resort to regulariza- 
tion methods, e.g.,, Wiener filtering (Wiener 1949; Rybicki 
& Press 1992; Zaroubi et al. 1995) and the Maximum En- 
tropy algorithm (Gull 1989). 

In particular, the WF has been widely applied to galaxy 
surveys (Lahav et al. 1994; Fisher et al. 1995; Schmoldt 
et al. 2000), CMB studies (Bunn et al. 1994; Tegmark 
& Efstathiou 1997, Bouchet & Gispert, 1999), and galaxy 
peculiar velocity catalogs (Hoffman et al. 2000; Zaroubi 
2000; Zaroubi et al. 1999 & 2001a). The WF provides an 
optimal estimator of the underlying field in the sense of a 
minimum- variance solution given the data and an assumed 
prior model (Wiener 1949; Press et al. 1992). The prior 
defines the data auto-correlation and the data-field cross- 



correlation matrices. In the case where the data is drawn 
from a random Gaussian field, the WF estimator coincides 
with the conditional mean field and with the most probable 
configuration given the data (see Zaroubi et al. 1995). 

Although the application of the WF is very simple and 
has proven to be useful for many purposes, it is easy to 
show that the estimator is intrinsically biased, often in a 
scale dependent manner. The main cause of this bias stems 
from the modulation introduced by the Signal/(Signal + 
Noise) weighting it invokes. This drawback has prevented 
the use of the Wiener reconstructed maps in many areas, 
e.g., power spectrum estimations, bias parameter extrac- 
tion from galaxy peculiar velocity data comparison with 
Galaxy surveys data, etc. To account for this bias in the 
power spectrum estimation a correction factor is often ap- 
plied to the Wiener reconstructed signal (Rybicki & Press 
1992; Tegmark & Efstathiou 1997). 

In this paper, we propose a new linear unbiased mini- 
mal variance (UMV) estimator that is designed to avoid the 
intrinsic bias that exists in the WF. This is achieved by solv- 
ing the minimization equation subject to the constraint of 
unbiased mean underlying field. To test the UMV estimator 
we apply it: 1- To recover a ID time series from convolved 
and noisy data with gaps. 2- To mock SEcat peculiar ve- 
locity data set, a combination of the SFI (Giovanelli et al. 
1998) and the ENEAR (da Costa et al. 2000) galaxy pecu- 
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liar velocity catalogs, where we reconstruct the distribution 
of the true density and velocity fields from which the mock 
catalog is constructed. 

The outline of this paper is as follows. The method 
is presented in § ^. In § two simple extra regularization 
methods are discussed. § ^1 shows how constrained realiza- 
tions could be produced within the UMV framework. The 
method is tested using artificial data based on simulations 
in § ^. The results are discussed and the conclusions are 
summarized in § ^. 



2 THEORY 

2.1 Derivation of the UMV estimator 

Consider the case of a set of observations, or measurements, 
performed on an underlying field s = {s a }(a = 1, N), or 
on any field linearly related to s, which yields a set of data 
points, d = {di}(i — 1,...,M). In particular, we are inter- 
ested in measurements that can be modeled mathematically 
as a linear convolution or mapping of the underlying field, 



([rr+ +\HdU) = <[(s- 



S )(s- 



_UMV\ + 



AHd]AT>(4) 



: o + e = Rs + e 



(1) 



where o = Rs and R is an M x N matrix which represents 
the response or point spread function (hereafter RF) and e 
is the noise vector associated with the data. The RF will be 
treated as any function that linearly connects the underlying 
signal to the data, be it blurring or smoothing introduced by 
the measurement, some theoretical relationship between two 
fields, or any other linear transformation on the underlying 
signal. 

In principle, reconstructing s can be accomplished by 
inverting equation eq. |l| However two main obstacles usually 
prevent one from pursuing this approach. First, the number 
of independent data points is usually much smaller than 
the number of underlying degrees of freedom. Second, the 
presence of noise can render such a direct inversion unstable 
and the obtained results meaningless. Due to these potential 
difficulties one is often forced to resort to some statistical 
regularization techniques {e.g., WF) in order to solve eq. Q 

Similar to the requirements of the derivation of the WF 
we assume the prior knowledge of the signal correlation func- 
tion 



S= <ss+), 



(2) 



where s is the complex conjugate of the transpose of the 
underlying signal, and angled brackets denote the signal en- 
semble average. Notice, that there have been no assumptions 
made regarding the actual probability distribution function 
from which the field s is drawn. We define the unbiased 
minimal variance estimator, s = Hd where H is an 
TV x M matrix that minimizes the variance of the residual 
,n while satisfying the constraint, 



r = s — s 



[s UMV U = [HdK = s. 



(3) 



The ensemble average, [. . .]at, is an ensemble average over 
noise realizations and is very different from the ensemble 
average, {...), which denotes an ensemble average over sig- 
nal realizations. This distinction between the two ensemble 
averages will be used only in this section, in the rest of the 
paper we use only the signal ensemble average. 
In short, we are seeking H that minimizes, 



where A is a Lagrange multiplier. The ensemble average over 
noise realizations precedes the one over signal realization. 
The order of averaging over ensembles is very important 
since if it was reversed the contribution of the term Hd will 
be null. 

The constraint, introduced in eq. |^, assumes that the 
data is unbiased, namely the errors are random, and there- 
fore requires that the estimator does not alter the value of 
the measured data points but rather it forces the field to 
retain the measured values at their appropriate locations. 
However, this requirement does not guarantee an unbiased 
variance of the reconstruction, on the contrary one expects 
that the variance of the reconstructed field is some com- 
promise between the (Signal + Noise) variance of the data 
points and the assumed variance of the underlying signal. 

Carrying out the minimization of eq. ^ with respect to 
H one obtains an equation that together with eq. |^ are used 
to solve for H and A. The solution yields the UMV estimator, 



= (so+xoo+r'd. 



(5) 



The Lagrange multiplier, A, is roughly proportional to the 
Noise/ (Signal+Noise) making it dominant when the noise 
is dominant and small when the Signal/Noise 2> 1. 

In the absence of a response function that operates on 
the signal, namely, R = I (where I is the unity matrix), the 
reconstructed signal at the location of the data points is 
identical to the data measured values which is consistent 
with the constraint given in Eq. []. In the rest of space the 
degrees of freedom are recovered by interpolating the data 
points in a manner consistent with the correlation assumed 
in the underlying theory. 

Mathematically, the difference between the WF and the 
current estimator is that in the former the term (oo + ) is 
replaced by (dd + ), a matrix that includes the noise corre- 
lations, an addition that accounts for the signal suppression 
which renders the WF mean field biased. 

The variance of the field estimated in Eq. |E] is, 

{s UMV s UMV + ) = 

(so + )(oo+>- 1 «oo+) + N) (o+o)- 1 (os + > ,(6) 

where N is the noise correlation matrix. The variance of the 
residual, 



(rr+) = (ss+) - (so + )(oo + )" 1 (os + ) 

+ <so + )<oo + r 1 N<oo + }- 1 (os + ) 



(7) 



As a simple example, assume that R = I and that the 
field is estimated in the exact locations of the data points, 
the variance in eq. |^ simply reduces to (ss + ) + N, recovering 
the power spectrum of the signal + noise at those points. 
The variance of the residual (eq. ^) at the location of the 
data points is simply reduced to the correlation matrix of the 
noise, N. In addition, when the data points are uncorrelated 
with the rest of the underlying degrees of freedom then the 
reconstructed values, at locations different from those of the 
data points, are zero (eq. []) and the variance of the residual 
at those same locations, as obtained from eq. [?], is simply 
the underlying prior correlation. 

Substitution of o = Rs in equation ^ yields, 



(ss + R + )(Rss + R + ) _1 d, 



(8) 
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which one could be tempted to further simplify to obtain 
s umv = ( ss +)( ss +)-iR-id = R _1 d. Normally however, 
the matrix R is not square but rather has a larger number 
of rows than columns and its inverse is not unique and the in- 
verse of its transpose, R + , does not exist at all. Therefore, in 
most cases carrying the simplification further is mathemat- 
ically incorrect. This simplification is possible if the inverse 
of both R and R + exist, which is only true if the number 
of data points is identical to the number of degrees of free- 
dom in the underlying signal or when the signal correlation 
function is a Dirac-delta function. 

However, when the simplification could be performed 
the UMV estimator is equivalent to direct inversion. Since 
direct inversion in the presence of noise could produce very 
large uncertainties in the reconstructed signal one should 
resort to extra regularization that reduces the variance to a 
manageable size, an example of such an extra regularization 
is given in § ^. 

To summarize, the regularization strength of the UMV 
estimator stems from the cross talk between the underly- 
ing signal and the data points at different locations. The 
stronger the correlation is, the more advantageous the use 
of the UMV. With the absence of such a cross talk the num- 
ber of degrees of freedom that one can reconstruct is iden- 
tical to the number of data points, and the UMV estimator 
is reduced in this case back to, R _1 , namely equivalent to 
direct inversion, a case that requires extra regularization in 
order to reduce the, often huge, uncertainty introduced by 
the noise. 



3 EXTRA REGULARIZATION 

As shown in the previous section, the UMV could be equiva- 
lent to direct inversion, therefore, while it produces an unbi- 
ased estimator of the underlying field, it gives a "minimum 
variance" that is too large to be useful. In contrast, the WF 
filter produces a manageable variance but a large degree of 
bias. Hence, the UMV alone is often insufficient to stabilize 
the deconvolution and consequently it must be either mod- 
ified, supplemented with extra regularization, or replaced 
with another method. 

Of course, to fully solve the inversion problem one might 
want to use more sophisticated non-linear methods such as 
Maximum Entropy (Gull f989), Pixons image restoration 
algorithm (Pifia & Puetter 1993), etc. These methods, how- 
ever more advanced, are much more computationally expen- 
sive and complicated to apply. 

In this paper we choose to modify the UMV inversion 
method with extra regularization that is both s imp le and 
easy to integrate in the algorithm. In subsections |3.l| & [3.2[ 
two simple extra regularization methods are introduced, the 
Singular Value Decomposition (SVD) algorithm (e.g., Press 
et al. 1992), and a WF-like regularization. These meth- 
ods are basically applied to find the vector a, defined as 
a = O : d = (oo + ) _1 d (see Eq. |). The difficulty in finding 
a could be due to the inversion instability of O or due to the 
noise in matrix a. In both methods suggested here this diffi- 
culty is solved by, in effect, adding elements to the diagonal 
of O or to its eigen-values so that it stabilizes the inversion 
of O and at the same time suppress the noise contribution 
of d to the vector a. 



3.1 The Singular Value Decomposition Algorithm 
as a Regularizer 

The following presentation follows the one in Zaroubi et al. 
(1995). Mathematically the solution of Eq. ^| involves the 
solution of the equation 



Oa = d, 



(9) 



for the unknown vector a. matrix. 

The SVD algorithm basically decomposes the positive 
definite matrix O*^ into a multiplication of three matrices, 
O = U diag{uii} U + , where the set {wt} is referred to as 
the collection of singular values, or the eigenmodes, and the 
columns of the matrix U are orthogonal and proportional to 
the eigenvectors of O. The inversion, after decomposition, 
is straightforward and gives O 1 = U diag{l/wi} U+. For- 
mally speaking, Eq. has a unique solution if and only if 
O is a non-singular matrix, namely if Wi ^ for all i. How- 
ever a meaningful solution can be obtained even in the case 
where O is singular, by requiring the solution to minimize 
the norm of the residuals, |Oa — d|. Such a solution is ob- 
tained by substituting 1/tVi = in the expression for the 
inverse for any Wi — (Press et al. 1992). 

The question arises in a particular problem of setting 
the lower limit of the singular values, below which the in- 
verse values are set to zero. In general, the singular values 
measure the amount of 'information' carried by each mode 
in the problem (Press et al. 1992), namely the small singular 
values do not have significant contribution to reconstruction, 
nevertheless, they can destabilize the inversion. As an exten- 
sion of the ideal case of Wi = 0, we impose a cutoff on the 
small singular values in order to maintain stability. Often, 
the structure of the sorted spectrum of singular values con- 
tain a very sharp drop normally appearing as a 'knee'. The 
location of the 'knee' usually determine the singular values 
that contain sufficient information. A simple cutoff at the 
location of the 'knee' normally does the trick and stabilizes 
the inversion. 

Another possible criterion for the choice of the cutoff 
would be by expanding the matrix O in terms of signal- 
to- noise eigenmodes (Bond 1995). This method allows a si- 
multaneous diagonalization of the Matrix O and the noise 
matrix N which in turn allows the usage of some signal-to- 
noise ratio threshold (typically 1) in order to set the cutoff. 



For specific application see the ID example in § 5.1, espe- 
cially the expansion shown in Fig. ^ (see also the the example 
discussed in Zaroubi et al. 1995). 



3.2 Wiener Filter-like Regularizer 

The WF by itself is a very robust and efficient regularizer. 
The main reason for this stems from the relatively high val- 
ues of the diagonal terms in the matrix (dd + ). These high 
values usually come from the noise contribution, especially 
if it is uncorrelated (e.g., white noise). This aspect is, of 
course, also responsible for the suppression of the Wiener 
reconstructed signal. 

From the point of view of the SVD algorithm the sta- 
bilization effect caused by the noise diagonal elements is 

* The matrix O is an auto-correlation matrix therefore it is a 
square positive definite matrix. 
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quite obvious, especially in the light of the following ex- 
ample. Consider the simplest case of white Gaussian noise 
where the noise correlation matrix is proportional to the 
unity matrix, I. With the application of the SVD algorithm 
the noise contribution is still proportional to the unit matrix 
and is the same for every singular value, this means that the 
singular values can be as small as the noise and non of them 
has a value of 'zero', therefore, the matrix is naturally stable 
an no extra stabilization is normally needed. 

In order to utilize the stabilization aspect of the WF 
on the one hand, while avoid the signal suppression aspect 
on the other, we add a very small white noise-like contri- 
bution to the correlation matrix O. Naturally, one should 
seek the smallest possible addition in order to suppress the 
recovered signal as little as possible. The choice of the am- 
plitude of the addional noise term could be guided by the 
same signal-to-noise cutoff criterion discussed in the previ- 
ous section (§ refsec:svd-regularization). This is, of course, 
problem dependent and should be carried out with caution. 

An example of the application of this approach is dis- 
cussed in § 



This kind of noisy constrained realization could be very 
useful for cases when one has gaps in the data that one would 
like to fill, not only in a manner consistent with the data but 
also with the same power spectrum of the Signal+Noise. 



5 APPLICATIONS 

The UMV estimator can be applied to many areas of LSS 
and CMB for: 1) interpolating between data points, e.g., 
filling in gaps in the time-ordered data to help map-making, 
bridging the Zone of Avoidance and other uncovered areas in 
nearly full sky catalogs, etc.; 2) stabilize inversions used to 
transform one dynamical field to another, e.g., from radial 
peculiar velocity field to density field; or from redshift to 
real density. 

Here we present two examples, the first is a reconstruc- 
tion from one dimensional time series of noisy convolved 
signal with gaps, e.g., CMB time-ordered data. The sec- 
ond example shows how this method could be applied to 
reconstruct the 3D density and peculiar velocity fields from 
observed galaxy radial peculiar velocity catalogs. 



4 CONSTRAINED REALIZATIONS 

Hoffman & Ribak (1991) showed that, within the framework 
of Gaussian random fields and a given prior, any realization 
can be split into two parts: the mean field, or the WF field, 
which is determined by the constraints (data) and the resid- 
ual field which is a Gaussian random field. Thus, one can 
make a constrained realization of the underlying field given 
the assumed prior model and the data. 

With the UMV reconstructed field one can produce con- 
strained noisy realizations at every point in space only if 
the noise in the observed quantity is uncorrelated. The for- 
mulation here is quite similar to the constrained realiza- 
tions procedure of Hoffman & Ribak (1991); one produces a 
random realization of the noisy underlying field and noise, 
gNoisy ^_ g _|_ then sam ple it in the same way the actual 
data is obtained, 

d = Rs + I. (10) 

Here, s and e are random realizations of the underlying 
field and the statistical uncertainties, respectively. The noise 
in the random realization of the data e is related to the noise 
in the random realization of the underlying noisy field a 
through the response function, 

e = R5. (11) 

A constrained realization of the field given the data is 
given by: 

s Noisy = s Noisy + H(d - d). (12) 

Here s Nolsy is the constrained noisy realization. 

Note that if the noise, 1, is correlated one cannot use 
this formalism as the UMV takes into account only the cor- 
relation in the underlying signal. In such a case, one can 
think of a different approach, where the noise correlation is 
included in the estimator, in this case one can expand the 
UMV estimator to have the form, 

s UMV = <so+ + ae+)(oo+ + ee+^d. (13) 



5.1 Time Series Example: Deconvolution of Noisy 
Data With Gaps 

To test the performance of the UMV estimator we proceed 
with a simple one dimensional example. Let s be a random 
Gaussian time series, with a known correlation function, 
which we would like to measure in the time range [0 — 200] 
(the signal and time units are arbitrary) . The measurement 
involves a convolution of the signal with Gaussian window 
of 5 time units width, the convolution is describe by the 
matrix C. The measurement procedure uniformly samples 
the signal at about 100 positions, except for the time range 
of [90 — 100] where there is a gap in the data. An instru- 
mental white noise, e, with standard deviation three times 
larger than the signal standard deviation is added. Math- 
ematically the data is connected to the underlying signal 
with d = Cs + e. The heavy-solid line in Fig. hi shows the 
underlying signal and the connected diamonds represent the 
measured data. 

This specific example is typical to what one obtains 
in CMB time-ordered data type of measurements where the 
matrix C is the instrument's point spread function, the noise 
level reflects the detectors sensitivity and the gaps are drop- 
outs in the data stream. 

The UMV and WF estimators for this case are, 

s c/w = (ss + C + )(Css+C + >- 1 d, (14) 
and, 

s WF = (ss + C + )(Css + C+ + e 2 !}- 1 ^ (15) 

Due to the large correlation length in this example the 
matrix O = (Css + C + ) in equation |l4] is unstable for in- 
version, therefore as previously discussed, one has to apply 
an extra regularization scheme. The heavy-solid line in Fig- 
ure ^| shows the sorted spectrum of the singular values (wi) 
of the matrix O versus the mode number. The very abrupt 
drop of the singular values around mode number 10 indi- 
cates that the information content of the rest of the singular 
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Observed points E 




200 



Figure 1. A one dimensional reconstruction example. The heavy- 
solid line shows the underlying signal A(t) as a function of time; 
both time and amplitude has arbitrary units. The underlying sig- 
nal is convolved with a Gaussian window function with width of 
5 time units; the convolved signal is then uniformly sampled with 
the exception of a gap in the time range of 90-100. A random 
noise was then added to produce the 'data' points shown with 
the diamond-shaped connected points; the signal- to- noise ratio 
in this example is 3. The heavy-dashed line shows the UMV re- 
constructed signal, while the heavy-dotted line shows the Wiener 
reconstructed signal 



Full WF Regularization 

With "WF — like Regularization 

Without Regularization 




10 100 
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1000 



Figure 2. Sorted spectrum of singular values of the matrix O. 
The heavy-solid line shows the singular values of O without reg- 
ularization. The heavy-dashed line shows the singular values of 
O after adding a diagonal constant noise matrix, the constant 
is 0.005 of the noise rms. The 'knee' around singular values of 
10 -9 is caused by numerical noise. For comparison, the dotted 
line shows the singular values of the full WF regularization 
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Figure 3. The solid line shows the same underlying signal as the 
in fig. [IJ To this signal we add 500 noise realizations to produce 
Monte-Calors of the 'observed' data. The dashed line shows the 
mean of the 500 UMV reconstructions of these realization. The 
dotted line shows the mean of the Wiener reconstruction of the 
each of each of the 500 data realizations 



values is very small and that they are essentially responsi- 
ble for the inversion instability. To stabilize the in vers ion, 
we adopt the regularization method described in 



3.2 



and 

add a diagonal constant with 0.005 of the noise contribu- 
tion. The heavy-dashed line in Figure |^ shows the sorted 
spectrum of singular values of the regularized O. The two 
lines are identical for the large singular values and depart 
at small singular values. For comparison, the dotted line in 
Figure ^| shows the singular values of the matrix used in the 
WF application, namely, O + N. 

The heavy-dashed line in Fig. [j] shows the UMV recon- 
struction of the signal that the follows directly from calcu- 
lating s UMV from equation ju| The dotted line shows the 
WF reconstructed signal as obtained from equation [l5|. The 
UMV reconstruction follows the data points while taking 
into account the deconvolution and the prior for the sig- 
nal temporal correlations. The WF reconstruction is much 
smoother and has smaller variance than the underlying sig- 
nal. 

To demonstrate the differences between the two recon- 
structions, fig. ^ shows an average of 500 reconstructions of 
data realizations with the same underlying signal but dif- 
ferent noise Monte-Carlos, the unbiased nature of the UMV 
and the biased nature of the WF reconstructions are evi- 
dent. In each UMV reconstruction we have applied the afore- 
mentioned WF-like regularization, the amount of bias intro- 
duced by this procedure is negligible, while the bias for the 
full WF application is not. 



5.2 Reconstruction from Peculiar Radial Velocity 
Data 

Next, we present an example of an application of the UMV 
estimator to reconstruct the density and 3D velocity fields 
from galaxy radial peculiar velocity data. The data set used 
here is a mock catalog that mimics the SEcat galaxy pecu- 
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liar velocity catalog (Zaroubi 2000), which is a combination 
of the grouped and Malmquist biases corrected SFI (Gio- 
vanelli et al. 1998) and ENEAR (da Costa et al. 2000) 
galaxy peculiar velocity catalogs. The catalog consists of 
about 2050 objects (« 1300 from SFI and « 750 from EN- 
EAR) for which it provides radial velocities and inferred dis- 
tances with errors, on the order of ss 19% of the distance per 
galaxy. The sampling is reasonably homogeneous and covers 
the whole sky outside the ZoA, the radial selection function 
is uniform out to » 5500 kms -1 . The ZoA is about ±15°. 
The SEcat catalog contains distances and peculiar veloci- 
ties of both late-type and early-type galaxies, and therefore 
has the advantage of sampling both high and low density 
environments minimizing possible biases that may affect re- 
construction from catalogs based on a single population of 
galaxies. 

The mock catalog is produced as follows, we first gener- 
ate a random linear Gaussian realization of density and ve- 
locity with a ACDM power spectrum (with Q. m = 0.3 & A = 
0.7). Then the mock catalog is produced so that the loca- 
tions of the data points in the mock catalog are the same as 
for those in the real SEcat catalog, however, the values of 
the radial peculiar velocities are taken from a random Gaus- 
sian realization of the underlying fields. The original density 
field is used for comparison with the reconstructed one. 

In this example the data points are given as a set of 
observed radial peculiar velocities u° sampled at positions 
ri with estimated errors a, assumed to be uncorrelated. The 
observed velocities are thus related to the true underlying 
velocity field v(r), or its radial component m at n, via 

u° = v(rj) • fi + ei = m + e t . (16) 

We assume that the peculiar velocity field v(r) and 
the density fluctuation field 5(r) are related via linear 
gravitational-instability theory, S = /(fl) _1 V • v, where 
f(Q) « fi ' 6 and fi is the mean universal density param- 
eter. Under the assumption of a specific theoretical prior for 
the power spectrum P(k) of the underlying density field, we 
can write the UMV estimator of the fields as 

v VMV (r) = (v(r)ui\(uiUj\ u°, (17) 

and, 

6 UMV (r) = /s(r) Ui Vu iU ^ l u°. (18) 

Assuming linear theory and that the velocities are drawn 
from a Gaussian random field, the two-point velocity- 
velocity and density- velocity correlation tensors (bracketed 
quantities in eqs. [l^&. ^) are readily calculated. The calcu- 
lation of these matrices is discussed elsewhere (Gorski 1988; 
Zaroubi et al. 1995,1999). 

We wish to test two aspects of the reconstruction. 
First, whether the coverage within the assigned area is 
good enough for a faithful recovery of the underlying signal. 
Second, whether the noise level allows a reasonable (high 
Signal/Noise) reconstruction within the 60/i _1 Mpc sphere. 
The success of the reconstruction is demonstrated in Fig- 
ure ^. The top-left panel shows the density of the underly- 
ing field, used to construct the mock SEcat catalog, at the 
Z = 0/i _1 Mpc plane, smoothed with 9/i _1 Mpc Gaussian, 
within a sphere of radius 60 /i _1 Mpc; this is the target map 
which we are attempting to recover. 



In order to test the quality of the coverage we construct 
a noise-free mock SEcat catalog which has accurate radial 
velocities at the locations of the data points. The top-right 
panel in Figure ^ shows the density reconstruction from the 
noise-free catalog, this map tests mainly the uniformity of 
the catalog within a sphere of 60/i _1 Mpc. The very good 
agreement between this map and the target map shows that 
the coverage of the SEcat catalog is excellent. 

Next, we test the effect of noise on the reconstruction; 
here we construct a noisy mock SEcat catalog where the 
added noise is realistic and corresponds to the quoted Tully- 
Fisher and D n — a errors in the real catalog. The lower- 
left panel shows density reconstruction from a noisy mock 
realization of the SEcat catalog. The agreement between this 
map and the target map is also good (see Figure |^), though 
there are some areas where the recovered signal is not very 
satisfying, especially towards the edge of the sphere. 

In order to test where the recovered signal is reliable 
we reconstruct the density field from 30 mock catalogs, with 
the same underlying field but with different noise realiza- 
tions, and compare them with the target map. The lower- 
right panel shows contours of the signal-to-noise ratio, with 
spacing of 1 and with heavy-solid contour denotes a signal- 
to-noise ratio of unity. The signal-to-noise ratio in some 
areas in the map can get up to 15 and in most of the map 
the signal-to-noise ration is greater than a few. However, if 
the underlying density is of order zero the signal-to-noise 
ratio gives a misleading impression about the quality of the 
reconstruction as in this case it will be always of the order 
of zero. Therefore, the lower-right panel also shows the area 
(shaded) within which the error in the density-contrast is 
less than 0.45. 

To demonstrate the stability of the inversion, all the 
panels, with the exception of the lower-right panel, in fig- 
ure ^ show reconstructions similar to the ones shown in the 
lower-left panel of fig |i| but with different error realizations. 
For comparison, the lower-right panel in fig. ^ shows the 
Wiener reconstructed density field from one of the noisy 
mock catalogs (the one used on the top left panel) , note 
that here the WF reconstruction roughly recovers the fea- 
tures in the target map; however, the amplitude of the den- 
sity is suppressed throughout the plane, reflecting the biased 
nature of the Wiener reconstruction. 

Figure |^ shows a scatter plot of the original vs. recon- 
structed densities within the whole reconstructed sphere. 
The densities are chosen from areas within which the errors, 
estimated from Monte Carlos of noisy mock SEcat catalogs, 
are less than 0.2. The left panel shows the quality of the re- 
construction from noise free catalog. While the right panel 
shows the reconstruction quality from noisy catalog. As ex- 
pected, the scatter in the right panel is larger. The measured 
slope is slightly smaller than 1 (w 0.98 ± 0.03, the quoted 
uncertainty is the la error), nevertheless the agreement be- 
tween the original and reconstructed is excellent. 



6 DISCUSSION & SUMMARY 

A general framework of linear estimation and prediction by 
minimal variance subject to a linear constraint in the data 
has been introduced. The solution of the minimization prob- 
lem yields the UMV estimator, which is shown to be a very 
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Figure 4. Testing the method with mock SEcat data. Shown 
are maps of density in the Supergalactic plane, smoothed with a 
Gaussian window of 900 kms -1 , G9. Density contour spacing is 
0.1, the mean <5 = contour is heavy, positive contours are solid 
and negative contours arc dashed. Top-left panel shows the orig- 
inal mock Supcrgalactic-planc. Top-right panel is the reconstruc- 
tion from a noiseless mock catalog, which shows the uniformity 
of the sampling and the quality of the interpolation. Bottom- 
left panel shows a typical UMV reconstruction from noisy data. 
Bottom-right panel shows the signal-to— noise ratio with contour 
spacing 1., the heavy-solid line indicate signal-to-noise ratio of 
unity. The shading indicates regions where the error is less than 
0.45 



useful tool for reconstructing the large scale structure of the 
universe from incomplete, noisy and sparse data. The UMV 
estimator has been designed to overcome one of the main 
drawbacks of the WF which is that it predicts the null field 
in the absence of good data, i.e., in the limit of very poor 
signal-to-noise data the cosmological mean field is estimated, 
i.e., zero perturbation. In contrast, the UMV estimator does 
not alter the values of the reconstructed field at the locations 
of the data point, because it lacks the filtering aspect of the 
WF, instead it keeps the values of the measured data at 
the locations of the data points and interpolates between 
them in accordance with the correlation function assumed 
in the model. Like the WF the new estimator can be used for 
dynamical reconstruction, i.e., to reconstruct one dynamical 
field, e.g., over-density, from another measured field, e.g., ra- 
dial peculiar velocity. These two properties make the UMV 
estimator a very appealing tool for various applications in 
LSS and CMB problems. 

The regularization strength of the new estimator stems 
from the cross correlation between the signal and the data 
points at different locations. Which allows a reconstruction 
that is consistent with the data points and the correlations 
among them. Lacking such a correlation, the UMV estima- 
tor is equivalent to direct inversion. In this case an addi- 
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Figure 5. Shown are maps of density in the Supergalactic plane 
reconstructed from different Monte-Carlo realizations of the er- 
rors The smoothing window is a Gaussian of radius 900 kms -1 , 
G9. Density contour spacing is 0.1, the mean <5 = contour 
is heavy, positive contours are solid and negative contours are 
dashed. The bottom-right panel shows the Supergalactic plane 
WF reconstruction from one of the Monte-Carlo realizations. 



tional regularization is required. This issue could be viewed 
as follows. The UMV produces an unbiased estimate of the 
underlying signal subject to the minimal variance require- 
ment. This minimal variance could sometimes be too large to 
be useful. In comparison the WF produces an truly minimal 
variance but with a biased mean. Therefore, when apply- 
ing the UMV estimator, sometimes an extra regularzation 
is needed in order to, on the one hand, reduce the variance in 
the UMV reconstruction to manageable value, while keeping 
the reconstructed underlying signal as unbiased as possible, 
on the other. In this paper two extra regularization meth- 
ods, that are both simple and easily integrated in the UMV 
method, have been discussed. 

Constrained realizations of the underlying field (Hoff- 
man & Ribak 1991) will not be possible with the UMV es- 
timator. However, constrained realizations of a noisy under- 
lying field are possible. Such noisy constrained realizations 
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<5(0riginal) <5(0riginal) 

Figure 6. Shown is scatter plot of the underlying density vs. the 
reconstructed density. The densities chosen are from areas within 
which the reconstruction error is < 0.2, this error is determined 
using 30 mock SEcat catalogs. Left panel shows the quality of 
the reconstruction from a noise-free mock catalog. Right panel 
shows the quality of the reconstruction from a noisy mock SEcat 
catalog. 

could be very useful if one wishes to fill gaps in the data, e.g. , 
from balloon-born CMB measurements, with a realization 
that has the same assumed properties of the Signal+Noisc. 

An apparent difficulty arises from the fact that the cur- 
rent UMV reconstruction assumes linear gravitational in- 
stability, yet it is applied to a universe that is non-linear 
on scales smaller than a few Mpc's. To obtain non-linear re- 
construction of the underlying field, s n i, one can include the 
non- linear correlation by substituting (s n io + ) in eq. |H[ This 
of course would only take into account the contribution of 
the variance and ignore the contribution of higher moments 
but in many cases this will do. 

In one of the exampled presents here, we have ap- 
plied the method to galaxy radial peculiar velocity data 
and shown that the reconstruction is unbiased and trust- 
worthy within a very large region of the volume covered by 
the data. In this context the UMV reconstruction could be 
viewed as a compromise between the POTENT algorithm 
(Bertschinger & Dekel 1989; Dekel, Bertschinger & Faber 
1990), which assumes no regularization, and the WF, which 
relies too heavily on it. 

The UMV reconstruction of the over-density and the 
3D velocity field from galaxy peculiar velocity catalogs is 
suitable for bias parameter extraction from a comparison 
with the respective fields obtained from redshift galaxy cat- 
alogs like the PSCz (Saunders et al. 2000; Branchini et al. 
2000). The current estimator allows carrying out density- 
density and velocity-velocity comparisons using the same 
reconstruction technique (Zaroubi et al. 2001b). 
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